{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Series\n",
    "**Series是pandas的一种存储结构，一维数组，它可以包含任何数据类型的标签。我们主要使用它们来处理时间序列数据**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1、 创建series Series([ ])\n",
    "\n",
    "**创建一个series,获取series的名称和索引**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0    1.0\n",
      "1    2.0\n",
      "2    NaN\n",
      "3    4.0\n",
      "4    5.0\n",
      "dtype: float64\n",
      "None\n",
      "RangeIndex(start=0, stop=5, step=1)\n"
     ]
    }
   ],
   "source": [
    "s = pd.Series([1, 2, np.nan, 4, 5])\n",
    "print(s)\n",
    "print(s.name)\n",
    "print(s.index)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2、 添加名称，修改索引"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "series name: Price Series\n",
      "new index: DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04',\n",
      "               '2016-01-05'],\n",
      "              dtype='datetime64[ns]', freq='D')\n",
      "2016-01-01    1.0\n",
      "2016-01-02    2.0\n",
      "2016-01-03    NaN\n",
      "2016-01-04    4.0\n",
      "2016-01-05    5.0\n",
      "Freq: D, Name: Price Series, dtype: float64\n"
     ]
    }
   ],
   "source": [
    "s.name = \"Price Series\"\n",
    "print(\"series name:\",s.name)\n",
    "new_index = pd.date_range(\"20160101\",periods=len(s), freq=\"D\")\n",
    "s.index = new_index\n",
    "print(\"new index:\",s.index)\n",
    "print (s)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3、 访问序列元素\n",
    "\n",
    "系列的访问通常使用 **iloc[ ]** 和 **loc[ ]** 的方法。我们使用iloc[]来访问元素的整数索引和我们使用loc[]来访问序列的索引\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### iloc\n",
    "**访问单个整数索引**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2016-01-01    1.0\n",
      "2016-01-02    2.0\n",
      "2016-01-03    NaN\n",
      "2016-01-04    4.0\n",
      "2016-01-05    5.0\n",
      "Freq: D, Name: Price Series, dtype: float64\n",
      "First element of the series:  1.0\n",
      "Last element of the series:  5.0\n"
     ]
    }
   ],
   "source": [
    "print (s)\n",
    "print(\"First element of the series: \", s.iloc[0])\n",
    "print(\"Last element of the series: \", s.iloc[len(s)-1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### iloc\n",
    "**访问范围的整数索引，从0到5，间隔2**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2016-01-01    1.0\n",
      "2016-01-02    2.0\n",
      "2016-01-03    NaN\n",
      "2016-01-04    4.0\n",
      "2016-01-05    5.0\n",
      "Freq: D, Name: Price Series, dtype: float64\n",
      "2016-01-01    1.0\n",
      "2016-01-03    NaN\n",
      "2016-01-05    5.0\n",
      "Freq: 2D, Name: Price Series, dtype: float64\n"
     ]
    }
   ],
   "source": [
    "print (s)\n",
    "print(s.iloc[0:5:2])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### loc\n",
    "**访问单个与范围的序列**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2016-01-01    1.0\n",
      "2016-01-02    2.0\n",
      "2016-01-03    NaN\n",
      "2016-01-04    4.0\n",
      "2016-01-05    5.0\n",
      "Freq: D, Name: Price Series, dtype: float64\n",
      "1.0\n",
      "2016-01-02    2.0\n",
      "2016-01-03    NaN\n",
      "2016-01-04    4.0\n",
      "Freq: D, Name: Price Series, dtype: float64\n"
     ]
    }
   ],
   "source": [
    "print (s)\n",
    "print(s.loc['20160101'])\n",
    "print(s.loc['20160102':'20160104'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4、 布尔索引\n",
    "\n",
    "除了上述访问方法,您可以使用布尔过滤序列数组。比较序列与标准是否一致。当与您设定的任何条件相比,这次你返回另一个序列中,回填满了布尔值。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2016-01-01    1.0\n",
      "2016-01-02    2.0\n",
      "2016-01-03    NaN\n",
      "2016-01-04    4.0\n",
      "2016-01-05    5.0\n",
      "Freq: D, Name: Price Series, dtype: float64\n",
      "2016-01-01     True\n",
      "2016-01-02     True\n",
      "2016-01-03    False\n",
      "2016-01-04    False\n",
      "2016-01-05    False\n",
      "Freq: D, Name: Price Series, dtype: bool\n",
      "2016-01-01    1.0\n",
      "2016-01-02    2.0\n",
      "Freq: D, Name: Price Series, dtype: float64\n",
      "2016-01-02    2.0\n",
      "Freq: D, Name: Price Series, dtype: float64\n"
     ]
    }
   ],
   "source": [
    "print (s)\n",
    "print(s < 3)\n",
    "print(s.loc[s < 3])\n",
    "print(s.loc[(s < 3) & (s > 1)])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "缺失数据\n",
    "当我们处理实际数据,有一个非常现实的遭遇缺失值的可能性。pandas提供我们处理它们的方法，我们有两个处理缺失数据的主要手段，一个是fillna,另一个是dropna。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5、 使用Series处理时间序列\n",
    "\n",
    "**读取excel数据并进行抽样resample()**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "                      close    high     low    open    volume\n",
      "datetime                                                     \n",
      "2017-01-03 15:00:00  115.99  117.06  115.14  115.43  16232125\n",
      "2017-01-04 15:00:00  116.28  116.42  115.21  115.99  29656234\n",
      "2017-01-05 15:00:00  116.07  116.64  115.64  116.07  26436646\n",
      "2017-01-06 15:00:00  115.21  116.07  114.86  116.07  17195598\n",
      "2017-01-09 15:00:00  115.35  115.99  114.86  115.64  14908745\n",
      "2017-01-10 15:00:00  115.28  115.64  114.93  115.21   7996636\n",
      "2017-01-11 15:00:00  115.07  115.64  115.00  115.64   9166532\n",
      "2017-01-12 15:00:00  114.78  115.35  114.71  115.21   8295650\n",
      "2017-01-13 15:00:00  115.85  115.99  114.64  114.64  19024943\n",
      "2017-01-16 15:00:00  117.92  118.20  114.64  115.57  53249124\n",
      "2017-01-17 15:00:00  116.85  117.77  116.56  117.21  12555292\n",
      "2017-01-18 15:00:00  117.42  117.85  116.49  116.92  11478663\n",
      "2017-01-19 15:00:00  117.77  118.49  116.99  116.99  12180687\n",
      "2017-01-20 15:00:00  118.06  118.63  117.49  118.06  14285968\n",
      "2017-01-23 15:00:00  117.99  118.84  117.56  118.63  14615740\n",
      "2017-01-24 15:00:00  118.91  118.91  118.06  118.06  14985241\n",
      "2017-01-25 15:00:00  118.91  119.20  118.27  118.84  11284869\n",
      "2017-01-26 15:00:00  119.41  119.91  118.27  118.84   8602907\n",
      "2017-02-03 15:00:00  118.42  119.98  118.34  119.77   8171489\n",
      "2017-02-06 15:00:00  118.63  119.48  118.63  119.27  13455250\n",
      "2017-02-07 15:00:00  118.77  119.20  118.42  118.56  14757284\n",
      "2017-02-08 15:00:00  118.63  118.84  117.77  118.42  11238767\n",
      "2017-02-09 15:00:00  119.06  119.41  118.13  118.77  11393034\n",
      "2017-02-10 15:00:00  119.48  119.91  118.91  119.34  13983062\n",
      "2017-02-13 15:00:00  119.98  120.34  119.48  120.20  19992372\n",
      "2017-02-14 15:00:00  119.34  120.20  119.20  120.12  12987135\n",
      "2017-02-15 15:00:00  119.98  120.55  119.27  119.77  25687112\n",
      "2017-02-16 15:00:00  119.48  120.41  119.34  120.20  16325732\n",
      "2017-02-17 15:00:00  118.56  119.77  118.13  119.48  13863642\n",
      "2017-02-20 15:00:00  120.55  120.91  118.34  118.34  29915560\n",
      "...                     ...     ...     ...     ...       ...\n",
      "2017-10-10 15:00:00  122.81  122.81  121.78  122.44  13475400\n",
      "2017-10-11 15:00:00  122.44  122.91  122.16  122.34   9654900\n",
      "2017-10-12 15:00:00  122.34  122.72  121.59  122.34   8363600\n",
      "2017-10-13 15:00:00  121.31  122.62  121.22  122.16  11271700\n",
      "2017-10-16 15:00:00  122.25  122.44  121.31  121.59  11832600\n",
      "2017-10-17 15:00:00  121.78  122.44  121.41  122.16   7934100\n",
      "2017-10-18 15:00:00  122.53  122.72  121.22  121.87  22599700\n",
      "2017-10-19 15:00:00  123.09  123.37  121.69  122.25  28931900\n",
      "2017-10-20 15:00:00  121.97  122.81  121.97  122.53   8716900\n",
      "2017-10-23 15:00:00  120.37  122.16  120.28  122.06  15590300\n",
      "2017-10-24 15:00:00  120.56  121.41  120.19  120.37  12571800\n",
      "2017-10-25 15:00:00  120.94  121.31  120.19  120.56  10200400\n",
      "2017-10-26 15:00:00  120.19  120.75  119.81  120.75  12938000\n",
      "2017-10-27 15:00:00  120.47  121.31  120.19  120.37  15482700\n",
      "2017-10-30 15:00:00  119.06  120.19  118.03  120.19  37086800\n",
      "2017-10-31 15:00:00  118.22  118.69  117.94  118.22   9330200\n",
      "2017-11-01 15:00:00  117.56  119.25  117.47  118.12  16948000\n",
      "2017-11-02 15:00:00  117.47  117.75  116.53  117.37  23219200\n",
      "2017-11-03 15:00:00  117.94  118.12  116.53  117.47  15786000\n",
      "2017-11-06 15:00:00  116.91  117.56  116.72  117.56   9785200\n",
      "2017-11-07 15:00:00  117.56  118.12  116.34  116.91  19003800\n",
      "2017-11-08 15:00:00  117.94  118.87  117.19  117.47  18500100\n",
      "2017-11-09 15:00:00  117.66  118.41  117.47  117.84   8739900\n",
      "2017-11-10 15:00:00  118.41  118.41  116.81  117.56  24748600\n",
      "2017-11-13 15:00:00  120.00  120.47  118.41  118.59  41250100\n",
      "2017-11-14 15:00:00  118.12  119.72  117.94  119.62  17172100\n",
      "2017-11-15 15:00:00  118.12  118.41  117.66  117.84  14029600\n",
      "2017-11-16 15:00:00  116.16  117.75  116.06  117.75  18042800\n",
      "2017-11-17 15:00:00  119.81  120.00  116.25  116.25  53475100\n",
      "2017-11-20 15:00:00  120.47  120.56  118.22  118.97  29413900\n",
      "\n",
      "[215 rows x 5 columns]\n"
     ]
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "data = pd.read_excel('sz50.xlsx', sheetname=0, index_col='datetime')\n",
    "print(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#只保留data中的close，获取data的数据类型与前5个值："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "Series = data.close"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "datetime\n",
       "2017-01-03 15:00:00    115.99\n",
       "2017-01-04 15:00:00    116.28\n",
       "2017-01-05 15:00:00    116.07\n",
       "2017-01-06 15:00:00    115.21\n",
       "2017-01-09 15:00:00    115.35\n",
       "Name: close, dtype: float64"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Series.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "用resample给每个月的最后一天抽样。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "datetime\n",
      "2017-01-31    119.41\n",
      "2017-02-28    118.06\n",
      "2017-03-31    114.00\n",
      "2017-04-30    108.30\n",
      "2017-05-31    120.37\n",
      "Freq: M, Name: close, dtype: float64\n"
     ]
    }
   ],
   "source": [
    "monthly_prices = Series.resample('M').last()\n",
    "print(monthly_prices.head(5))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "datetime\n",
       "2017-01-31    116.565\n",
       "2017-02-28    118.985\n",
       "2017-03-31    115.500\n",
       "2017-04-30    109.800\n",
       "2017-05-31    107.700\n",
       "Freq: M, Name: close, dtype: float64"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "monthly_prices_med = Series.resample('M').median()\n",
    "monthly_prices_med.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 6、 缺失数据处理\n",
    "\n",
    "**当我们处理实际数据,有一个非常现实的遭遇缺失值的可能性。pandas提供我们处理它们的方法，我们有两个处理缺失数据的主要手段，一个是fillna,另一个是dropna。**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "datetime\n",
      "2017-01-03    115.99\n",
      "2017-01-04    116.28\n",
      "2017-01-05    116.07\n",
      "2017-01-06    115.21\n",
      "2017-01-07       NaN\n",
      "2017-01-08       NaN\n",
      "2017-01-09    115.35\n",
      "Freq: D, Name: close, dtype: float64\n"
     ]
    }
   ],
   "source": [
    "from datetime import datetime\n",
    "data_s= Series.loc[datetime(2017,1,1):datetime(2017,1,10)]\n",
    "data_r=data_s.resample('D').mean() #插入每一天\n",
    "print(data_r.head(10))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**使用dropna()方法删除缺失值**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "datetime\n",
      "2017-01-03    115.99\n",
      "2017-01-04    116.28\n",
      "2017-01-05    116.07\n",
      "2017-01-06    115.21\n",
      "2017-01-09    115.35\n",
      "Name: close, dtype: float64\n"
     ]
    }
   ],
   "source": [
    "print(data_r.head(10).dropna())  #去掉缺失值"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**填写缺失的数据 fillna()**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "datetime\n",
      "2017-01-03    115.99\n",
      "2017-01-04    116.28\n",
      "2017-01-05    116.07\n",
      "2017-01-06    115.21\n",
      "2017-01-07    115.21\n",
      "2017-01-08    115.21\n",
      "2017-01-09    115.35\n",
      "Freq: D, Name: close, dtype: float64\n"
     ]
    }
   ],
   "source": [
    "print(data_r.head(10).fillna(method='ffill'))  #填写缺失的天为前一天的价格。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
